Hate speech recognition in multilingual text: hinglish documents
نویسندگان
چکیده
The Internet is a boon for mankind but its misuse has been increasing drastically. Social networking platforms such as Facebook, Twitter and Instagram play predominant role in expressing views by the users. Sometimes users wield abusive or inflammatory language, that may provoke readers. This paper aims to evaluate various machine learning deep techniques detect hate speech on social media Hinglish (English-Hindi code-mix) language. In this paper, we apply several methods, along with feature extraction word-embedding techniques, consolidated dataset of 20600 instances, detection from tweets comments Hinglish. experimental results reveal models perform better than general. Among models, CNN-BiLSTM model word2vec word embedding provides best results. yields 0.876 accuracy, 0.830 precision, 0.840 recall 0.835 F1-score. These surpass recent state-of-art approaches.
منابع مشابه
Character-Based Handwritten Text Recognition of Multilingual Documents
An effective approach to transcribe handwritten text documents is to follow a sequential interactive approach. During the supervision phase, user corrections are incorporated into the system through an ongoing retraining process. In the case of multilingual documents with a high percentage of out-of-vocabulary (OOV) words, two principal issues arise. On the one hand, a minor yet important matte...
متن کاملMultilingual Speech Recognition
We present two concepts for systems with language identification in the context of multilingual information retrieval dialogs. The first one has an explicit module for language identification. It is based on training a common codebook for all the languages and integrating over the output probabilities of language specific –gram models trained over the codebook sequences. The system can decide f...
متن کاملMultilingual Speech Recognition
The speech-to-speech translation system Verbmobil requires a multilingual setting. This consists of recognition engines in the three languages German, English and Japanese that run in one common framework together with a language identification component which is able to switch between these recognizers. This article describes the challenges of multilingual speech recognition and presents diffe...
متن کاملClustering multilingual documents by estimating text - to - text semantic relatedness
This thesis is about multilingual document clustering through estimating semantic relatedness between multilingual texts. Specifically we focus on the task of clustering multilingual documents with very limited or no supervisory information. We present two approaches to address the problem : a comparable-corpora based approach and a web-searches based approach. Our first approach derives pairwi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International journal of information technology
سال: 2023
ISSN: ['2511-2112', '2511-2104']
DOI: https://doi.org/10.1007/s41870-023-01211-z